About tasks management in HySoP

First note that tasks meaning in HySoP is not for well known “task parallelism”.

Whithout tasks, all MPI processes are participating in all operations in the simulation. Introducing tasks enable to subdivise the set of processes for some operators in the graph.

There are three main use cases:

  • One of the operators has only a sequential implementation while others can benefit from MPI parallelism.

  • Some operators are using only GPUs and others are using CPU cores and user wants to use all the resources of a compute note without sharing a GPU among several MPI processes (architectures with more CPU cores than GPU devices).

  • On-the-fly post-processing of the data on dedicated resources.

Setup tasks in HySoP is a 3 steps process:

  1. specify a proc_tasks array to domain object creation.

  2. create the MPIParams using task specification task_id and on_task.

  3. create the operators with the appropriate MPI_Params. We strongly advise to check the produced graph on the processes output (using HYSOP_VERBOSE) especially for the automatically inserted InterTasksRedistributes

Details and limitations

  1. A tasks is identified by an integer (HYSOP_DEFAULT_TASK_ID by default). Each process is involved in one or several tasks. For instance with 4 processes and 2 tasks (1 and 2):

  • One task per rank (disjoint tasks):

    proc_tasks = (1,2,2,2)
    proc_tasks = (2,1,2,2)
    proc_tasks = (1,2,1,2)
    
  • rank0 work on both tasks. In this case all items must be iterable. For the moment only nested tasks with the same root process is possible with largest tasks first:

    proc_tasks = ((2,1),(2,),(2,),(2,))
    proc_tasks = ((2,1),(2,),(2,1),(2,))
    

The following are not handled yet:

proc_tasks = ((1,2),(2,),(2,),(2,))  # largest task not in front for rank0
proc_tasks = ((2,),(2,1),(2,),(2,))  # root process is not the same for all tasks
proc_tasks = ((2,1),(2,),(1,),(2,))  # non-nested tasks
proc_tasks = ((1,),(1,),(1,2),(2,))  # non-nested tasks

In case of nested tasks, the domain.current_task() is always the first task in the tasks list of a process.

Note configurations with more than 2 tasks have not been tested.

  1. In case of disjoint tasks, a single MPI_Params still can be used:

box = Box(proc_tasks=proc_tasks)
mpi_params = MPIParams(comm=box.task_comm, task_id=box.current_task(), on_task=True)

Otherwise one MPI_Params per task is recommanded:

for t in box.all_tasks:
  mpi_params[t] = MPIParams(comm=box.get_task_comm(t), task_id=t, on_task=box.is_on_task(t))
  1. Automatic Inter-Tasks communications is done on graph building. Consider the following situation with 4 operators (op1, op2, op3 and endA) working on 3 fields (A, B anc C) and 2 tasks (taskA and taskB)

# TaskA      |   TaskB
# op1 A->A   |
# op2 A->B   |
#         `--B--,              // Inter-task communication step
#            |  op3 B,C->A
#         ,--A--'              // Inter-task communication step
# endA  A->A |

Note : without endA operator, the second communication is not automatically inserted because A field is an output for task A and cannot be invalidated by op3 on other task. Inter-task invalidation in graph building is not yet implemented.

Note this algorithm is not working with overlapping tasks This is due to the fact that A is both input and output in taskA. A workaround is to use an other field A’ and endA is then a copy

Automatic communications are inserted on task change through operator list. A search for appropriate field is done iterating trough previous and next operators.